智能论文笔记

RTN: Reinforced Transformer Network for Coronary CT Angiography Vessel-level Image Quality Assessment

Yiting Lu , Jun Fu , Xin Li , Wei Zhou , Sen Liu , Xinxin Zhang , Congfu Jia , Ying Liu , Zhibo Chen

分类：计算机视觉

2022-07-13

冠状动脉血管造影（CCTA）易受各种扭曲（例如伪影和噪声）的敏感，这严重损害了心血管疾病的确切诊断。适当的CCTA血管级图像质量评估（CCTA VIQA）算法可用于降低错误诊断的风险。 CCTA VIQA的首要挑战是，冠状动脉的本地部分确定最终质量是很难找到的。为了应对挑战，我们将CCTA VIQA作为多种现实学习（MIL）问题，并利用基于变压器的MIL主链（称为T-MIL），以将沿冠状动脉中心线的多个实例汇总为最终质量。但是，并非所有实例都提供最终质量的信息。有一些质量 - 欧元/负面实例介入确切的质量评估（例如，在实例中仅涵盖背景或冠状动脉的实例是无法识别的）。因此，我们提出了一个基于渐进的增强学习的实例丢弃模块（称为PRID），以逐步删除CCTA VIQA的质量 - 欧尔特尔/否定实例。基于上述两个模块，我们根据端到端优化提出了一个加强的变压器网络（RTN），用于自动CCTA VIQA。广泛的实验结果表明，我们提出的方法实现了现实世界中CCTA数据集的最新性能，超过了以前的MIL方法。

translated by 谷歌翻译

Reward-Adaptive Reinforcement Learning: Dynamic Policy Gradient Optimization for Bipedal Locomotion

Changxin Huang , Guangrun Wang , Zhibo Zhou , Ronghui Zhang , Liang Lin

分类：机器人

2021-07-05

由于涉及的复杂动态和多标准优化，控制非静态双模型机器人具有挑战性。最近的作品已经证明了深度加强学习（DRL）的仿真和物理机器人的有效性。在这些方法中，通常总共总共汇总来自不同标准的奖励以学习单个值函数。但是，这可能导致混合奖励之间的依赖信息丢失并导致次优策略。在这项工作中，我们提出了一种新颖的奖励自适应加强学习，用于Biped运动，允许控制策略通过使用动态机制通过多标准同时优化。该方法应用多重批评，为每个奖励组件学习单独的值函数。这导致混合政策梯度。我们进一步提出了动态权重，允许每个组件以不同的优先级优化策略。这种混合动态和动态策略梯度（HDPG）设计使代理商更有效地学习。我们表明所提出的方法优于总结奖励方法，能够转移到物理机器人。 SIM-to-Real和Mujoco结果进一步证明了HDPG的有效性和泛化。

translated by 谷歌翻译

PCW-Net: Pyramid Combination and Warping Cost Volume for Stereo Matching

Zhelun Shen , Yuchao Dai , Xibin Song , Zhibo Rao , Dingfu Zhou , Liangjun Zhang

分类：计算机视觉

2020-06-23

Existing deep learning based stereo matching methods either focus on achieving optimal performances on the target dataset while with poor generalization for other datasets or focus on handling the cross-domain generalization by suppressing the domain sensitive features which results in a significant sacrifice on the performance. To tackle these problems, we propose PCW-Net, a Pyramid Combination and Warping cost volume-based network to achieve good performance on both cross-domain generalization and stereo matching accuracy on various benchmarks. In particular, our PCW-Net is designed for two purposes. First, we construct combination volumes on the upper levels of the pyramid and develop a cost volume fusion module to integrate them for initial disparity estimation. Multi-scale receptive fields can be covered by fusing multi-scale combination volumes, thus, domain-invariant features can be extracted. Second, we construct the warping volume at the last level of the pyramid for disparity refinement. The proposed warping volume can narrow down the residue searching range from the initial disparity searching range to a fine-grained one, which can dramatically alleviate the difficulty of the network to find the correct residue in an unconstrained residue searching space. When training on synthetic datasets and generalizing to unseen real datasets, our method shows strong cross-domain generalization and outperforms existing state-of-the-arts with a large margin. After fine-tuning on the real datasets, our method ranks first on KITTI 2012, second on KITTI 2015, and first on the Argoverse among all published methods as of 7, March 2022. The code will be available at https://github.com/gallenszl/PCWNet.

translated by 谷歌翻译

Semantic-aware Message Broadcasting for Efficient Unsupervised Domain Adaptation

Xin Li , Cuiling Lan , Guoqiang Wei , Zhibo Chen

分类：计算机视觉 | 人工智能

2022-12-06

Vision transformer has demonstrated great potential in abundant vision tasks. However, it also inevitably suffers from poor generalization capability when the distribution shift occurs in testing (i.e., out-of-distribution data). To mitigate this issue, we propose a novel method, Semantic-aware Message Broadcasting (SAMB), which enables more informative and flexible feature alignment for unsupervised domain adaptation (UDA). Particularly, we study the attention module in the vision transformer and notice that the alignment space using one global class token lacks enough flexibility, where it interacts information with all image tokens in the same manner but ignores the rich semantics of different regions. In this paper, we aim to improve the richness of the alignment features by enabling semantic-aware adaptive message broadcasting. Particularly, we introduce a group of learned group tokens as nodes to aggregate the global information from all image tokens, but encourage different group tokens to adaptively focus on the message broadcasting to different semantic regions. In this way, our message broadcasting encourages the group tokens to learn more informative and diverse information for effective domain alignment. Moreover, we systematically study the effects of adversarial-based feature alignment (ADA) and pseudo-label based self-training (PST) on UDA. We find that one simple two-stage training strategy with the cooperation of ADA and PST can further improve the adaptation capability of the vision transformer. Extensive experiments on DomainNet, OfficeHome, and VisDA-2017 demonstrate the effectiveness of our methods for UDA.

translated by 谷歌翻译

A Late Multi-Modal Fusion Model for Detecting Hybrid Spam E-mail

Zhibo Zhang , Ernesto Damiani , Hussam Al Hamadi , Chan Yeob Yeun , Fatma Taher

分类：人工智能

2022-10-26

In recent years, spammers are now trying to obfuscate their intents by introducing hybrid spam e-mail combining both image and text parts, which is more challenging to detect in comparison to e-mails containing text or image only. The motivation behind this research is to design an effective approach filtering out hybrid spam e-mails to avoid situations where traditional text-based or image-baesd only filters fail to detect hybrid spam e-mails. To the best of our knowledge, a few studies have been conducted with the goal of detecting hybrid spam e-mails. Ordinarily, Optical Character Recognition (OCR) technology is used to eliminate the image parts of spam by transforming images into text. However, the research questions are that although OCR scanning is a very successful technique in processing text-and-image hybrid spam, it is not an effective solution for dealing with huge quantities due to the CPU power required and the execution time it takes to scan e-mail files. And the OCR techniques are not always reliable in the transformation processes. To address such problems, we propose new late multi-modal fusion training frameworks for a text-and-image hybrid spam e-mail filtering system compared to the classical early fusion detection frameworks based on the OCR method. Convolutional Neural Network (CNN) and Continuous Bag of Words were implemented to extract features from image and text parts of hybrid spam respectively, whereas generated features were fed to sigmoid layer and Machine Learning based classifiers including Random Forest (RF), Decision Tree (DT), Naive Bayes (NB) and Support Vector Machine (SVM) to determine the e-mail ham or spam.

translated by 谷歌翻译

Taking a Respite from Representation Learning for Molecular Property Prediction

Jianyuan Deng , Zhibo Yang , Hehe Wang , Iwao Ojima , Dimitris Samaras , Fusheng Wang

分类：人工智能 | 机器学习

2022-09-26

人工智能（AI）已被广泛应用于药物发现中，其主要任务是分子财产预测。尽管分子表示学习中AI技术的繁荣，但尚未仔细检查分子性质预测的一些关键方面。在这项研究中，我们对三个代表性模型，即随机森林，莫尔伯特和格罗弗进行了系统比较，该模型分别利用了三个主要的分子表示，扩展连接的指纹，微笑的字符串和分子图。值得注意的是，莫尔伯特（Molbert）和格罗弗（Grover）以自我监督的方式在大规模的无标记分子库中进行了预定。除了常用的分子基准数据集外，我们还组装了一套与阿片类药物相关的数据集进行下游预测评估。我们首先对标签分布和结构分析进行了数据集分析；我们还检查了阿片类药物相关数据集中的活动悬崖问题。然后，我们培训了4,320个预测模型，并评估了学习表示的有用性。此外，我们通过研究统计测试，评估指标和任务设置的效果来探索模型评估。最后，我们将化学空间的概括分解为施加间和支柱内的概括，并测量了预测性能，以评估两种设置下模型的普遍性。通过采取这种喘息，我们反映了分子财产预测的基本关键方面，希望在该领域带来更好的AI技术的意识。

translated by 谷歌翻译

MiNL: Micro-images based Neural Representation for Light Fields

Hanxin Zhu , Henan Wang , Zhibo Chen

分类：计算机视觉

2022-09-17

光场的传统表示形式可以分为两种类型：显式表示和隐式表示。与将光字段表示为基于子孔图像（SAI）的阵列或微图像（MIS）的透镜图像的明确表示不同，隐式表示将光场视为神经网络，与离散的显式表示相反，这是固有的连续表示。但是，目前，光场的几乎所有隐式表示都利用SAI来训练MLP，以学习从4D空间角坐标到像素颜色的像素映射，这既不紧凑，也不是较低的复杂性。取而代之的是，在本文中，我们提出了Minl，这是一种新型的MI-Wise隐式神经表示，用于训练MLP + CNN，以学习从2D MI坐标到MI颜色的映射。考虑到微图像的坐标，MINL输出相应的微图像的RGB值。 MINL中编码的光场只是训练一个神经网络以回归微图像，而解码过程是一个简单的前馈操作。与普通像素的隐式表示相比，MINL更加紧凑，更高效，具有更快的解码速度（\ textbf {$ \ times $ 80 $ \ sim $ 180}加速）以及更好的视觉质量（\ textbf {1 $ \ \ \ \ \ \ \ \ \ \ \ \ \ \ SIM $ 4DB} PSNR平均改进）。

translated by 谷歌翻译

Explainable Artificial Intelligence to Detect Image Spam Using Convolutional Neural Network

Zhibo Zhang , Ernesto Damiani , Hussam Al Hamadi , Chan Yeob Yeun , Fatma Taher

分类：计算机视觉

2022-09-07

图像垃圾邮件威胁检测一直是互联网惊人扩展的流行研究领域。这项研究提出了一个可解释的框架，用于使用卷积神经网络（CNN）算法和可解释的人工智能（XAI）算法检测垃圾邮件图像。在这项工作中，我们使用CNN模型分别对图像垃圾邮件进行了分类，而hoc XAI方法包括局部可解释的模型不可思议的解释（Lime）和Shapley添加说明（SHAP），以提供有关黑手盒CNN的决定的解释关于垃圾邮件图像检测的模型。我们在6636图像数据集上训练，然后评估拟议方法的性能，包括垃圾邮件图像和从三个不同的公开电子邮件Corpora收集的垃圾邮件图像和正常图像。实验结果表明，根据不同的性能指标，提出的框架实现了令人满意的检测结果，而独立模型的XAI算法可以为不同模型的决策提供解释，以比较未来的研究。

translated by 谷歌翻译

Person Monitoring by Full Body Tracking in Uniform Crowd Environment

Zhibo Zhang , Omar Alremeithi , Maryam Almheiri , Marwa Albeshr , Xiaoxiong Zhang , Sajid Javed , Naoufel Werghi

分类：计算机视觉

2022-09-02

全身追踪器用于监视和安全目的，例如人跟踪机器人。在中东，统一的人群环境是挑战最新跟踪器的常态。尽管过去文献中记录的跟踪器技术有了很大的改进，但这些跟踪器尚未使用捕获这些环境的数据集进行了培训。在这项工作中，我们在统一的人群环境中开发了一个带有一个特定目标的注释数据集。该数据集是在四种不同的情况下生成的，在四种不同的情况下，目标主要是与人群一起移动，有时会与它们阻塞，而其他时候，相机的目标视图在短时间内被人群阻止。注释后，它用于评估和微调最新的跟踪器。我们的结果表明，与初始预训练的跟踪器相比，基于两个定量评估指标的微调跟踪器在评估数据集上的性能更好。

translated by 谷歌翻译

Learned Lossless JPEG Transcoding via Joint Lossy and Residual Compression

Xiaoshuai Fan , Xin Li , Zhibo Chen

分类：计算机视觉

2022-08-24

作为常用的图像压缩格式，JPEG已广泛应用于图像的传输和存储。为了进一步降低压缩成本，同时保持JPEG图像的质量，已提出无损的转码技术来重新压缩DCT域中的压缩JPEG图像。另一方面，以前的工作通常会降低DCT系数的冗余性，并以手工制作的方式优化熵编码的概率预测，缺乏概括能力和灵活性。为了应对上述挑战，我们提出了通过关节损失和残留压缩的学习的无损JPEG转码框架。我们没有直接优化熵估计，而是关注DCT系数中存在的冗余。据我们所知，我们是第一个利用学习的端到端损失变换编码来减少紧凑型代表域中DCT系数的冗余的人。我们还引入了无损转编码的残留压缩，在使用基于上下文的熵编码对其进行压缩之前，它会自适应地学习残留DCT系数的分布。我们提出的转码结构在JPEG图像的压缩中表现出显着的优势，这要归功于学习的损失变换编码和残留熵编码的协作。在多个数据集上进行的广泛实验表明，根据JPEG压缩，我们提出的框架平均可以节省约21.49％的位，这表现优于典型的无损失转码框架JPEG-XL的jpeg-XL 3.51％。

translated by 谷歌翻译

HTML版本